Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT

نویسندگان

  • Licong Cui
  • Wei Zhu
  • Shiqiang Tao
  • James T. Case
  • Olivier Bodenreider
  • Guo-Qiang Zhang
چکیده

Objective Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of missing hierarchical relations and concepts in SNOMED CT. Material and Methods All non-lattice subgraphs (the structural part) in SNOMED CT are exhaustively extracted using a scalable MapReduce algorithm. Four lexical patterns (the lexical part) are identified among the extracted non-lattice subgraphs. Non-lattice subgraphs exhibiting such lexical patterns are often indicative of missing hierarchical relations or concepts. Each lexical pattern is associated with a potential specific type of error. Results Applying the structural-lexical method to SNOMED CT (September 2015 US edition), we found 6801 non-lattice subgraphs that matched these lexical patterns, of which 2046 were amenable to visual inspection. We evaluated a random sample of 100 small subgraphs, of which 59 were reviewed in detail by domain experts. All the subgraphs reviewed contained errors confirmed by the experts. The most frequent type of error was missing is-a relations due to incomplete or inconsistent modeling of the concepts. Conclusions Our hybrid structural-lexical method is innovative and proved effective not only in detecting errors in SNOMED CT, but also in suggesting remediation for these errors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Auditing SNOMED CT hierarchical relations based on lexical features of concepts in non-lattice subgraphs

OBJECTIVE We introduce a structural-lexical approach for auditing SNOMED CT using a combination of non-lattice subgraphs of the underlying hierarchical relations and enriched lexical attributes of fully specified concept names. Our goal is to develop a scalable and effective approach that automatically identifies missing hierarchical IS-A relations. METHODS Our approach involves 3 stages. In ...

متن کامل

Title: Detecting Misaligned and Missing Concepts in SNOMED CT using Structural and Lexical Patterns

Objective: Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of novel anomalies in SNOMED CT. The structural component is based on shared isa relations to other concepts. The lexical component leverages shared words in description...

متن کامل

Identifying Missing Hierarchical Relations in SNOMED CT from Logical Definitions Based on the Lexical Features of Concept Names

Objectives. To identify missing hierarchical relations in SNOMED CT from logical definitions based on the lexical features of concept names. Methods. We first create logical definitions from the lexical features of concept names, which we represent in OWL EL. We infer hierarchical (subClassOf) relations among these concepts using the ELK reasoner. Finally, we compare the hierarchy obtained from...

متن کامل

Identifying Potentially Missing Hierarchical Relations in SNOMED CT based on Lexical Features - Impact of Synonyms and Lexico-syntactic Constraints

Introduction The quality assurance of large bio-ontologies is extremely critical for their effective and continued use and is an active area of research1. For example, recent investigations highlighted issues in the hierarchical structure of SNOMED CT and its detrimental effects on biomedical applications2. Previous work by one of the authors3 established a method to identify potentially missin...

متن کامل

NEO: Systematic Non-Lattice Embedding of Ontologies for Comparing the Subsumption Relationship in SNOMED CT and in FMA Using MapReduce

A structural disparity of the subsumption relationship between FMA and SNOMED CT's Body Structure sub-hierarchy is that while the is-a relation in FMA has a tree structure, the corresponding relation in Body Structure is not even a lattice. This paper introduces a method called NEO, for non-lattice embedding of FMA fragments into the Body Structure sub-hierarchy to understand (1) this structura...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of the American Medical Informatics Association : JAMIA

دوره 24 4  شماره 

صفحات  -

تاریخ انتشار 2017